πŸ•·οΈοΈ Job Radar β€’ SCRAPING

Live freelance tracking. Raw descriptions turned into structured data. Find your next tech project without the noise.

upwork.com 🟒 2026-05-08

πŸ”Ή Automate Web Scraping & Document Upload for Pinecone Embeddings
πŸ‘€ Client: πŸ‡ΊπŸ‡Έ United States Member since 2023-09-16
πŸ’° Price: ****
🚩 Problem: Integrate web scraping and document upload into a unified system for an AI knowledge base.
πŸ“¦ Existing: Not specified

Specifications:

[Target] n8n Workflows, FastAPI Backend, Google Sheets API, OpenAI Embeddings, Pinecone Ingestion
[Method] Daily Web Scraping, Document Upload, Error Handling, Logging, Uploading to Pinecone
[UI/UX] Simple UI for Document Upload
[Stack] Python, n8n, FastAPI, PyMuPDF, python-docx, openpyxl, Google Sheets API, OpenAI, Pinecone
[Security] Secure Data Transfer, Error Handling, Logging
[Format] JSON (for n8n workflows), HTTP Endpoints (FastAPI)

Workflow:

1. Schedule daily n8n workflows to scrape target websites.
2. Extract and structure data from web pages using HTML nodes in n8n.
3. Write structured data to Google Sheets via API calls.
4. Detect changes in scraped data and update Google Sheets accordingly.
5. Chunk, embed (using OpenAI embeddings), and upsert data into Pinecone.
6. Implement error handling and logging for all steps.
7. Create a FastAPI endpoint to manually trigger sync with Pinecone.
8. Build a simple UI for uploading PDFs, Word documents, and Excel files.
9. Parse uploaded documents using PyMuPDF, python-docx, openpyxl.
10. Clean text data from parsed documents.
11. Chunk cleaned text into manageable pieces.
12. Generate OpenAI embeddings for each chunk.
13. Upsert embedded chunks to Pinecone via the FastAPI backend.
14. Provide upload status and confirmation messages.

⚑ Receive notifications instantly Join our community.